========================================================

introduction

Explore and summarize data is a projects that explores red wine, and it contains 1599 observations and 13 variables.

##  [1] "X"                    "fixed.acidity"        "volatile.acidity"    
##  [4] "citric.acid"          "residual.sugar"       "chlorides"           
##  [7] "free.sulfur.dioxide"  "total.sulfur.dioxide" "density"             
## [10] "pH"                   "sulphates"            "alcohol"             
## [13] "quality"
##        X          fixed.acidity   volatile.acidity  citric.acid   
##  Min.   :   1.0   Min.   : 4.60   Min.   :0.1200   Min.   :0.000  
##  1st Qu.: 400.5   1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090  
##  Median : 800.0   Median : 7.90   Median :0.5200   Median :0.260  
##  Mean   : 800.0   Mean   : 8.32   Mean   :0.5278   Mean   :0.271  
##  3rd Qu.:1199.5   3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420  
##  Max.   :1599.0   Max.   :15.90   Max.   :1.5800   Max.   :1.000  
##  residual.sugar     chlorides       free.sulfur.dioxide
##  Min.   : 0.900   Min.   :0.01200   Min.   : 1.00      
##  1st Qu.: 1.900   1st Qu.:0.07000   1st Qu.: 7.00      
##  Median : 2.200   Median :0.07900   Median :14.00      
##  Mean   : 2.539   Mean   :0.08747   Mean   :15.87      
##  3rd Qu.: 2.600   3rd Qu.:0.09000   3rd Qu.:21.00      
##  Max.   :15.500   Max.   :0.61100   Max.   :72.00      
##  total.sulfur.dioxide    density             pH          sulphates     
##  Min.   :  6.00       Min.   :0.9901   Min.   :2.740   Min.   :0.3300  
##  1st Qu.: 22.00       1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500  
##  Median : 38.00       Median :0.9968   Median :3.310   Median :0.6200  
##  Mean   : 46.47       Mean   :0.9967   Mean   :3.311   Mean   :0.6581  
##  3rd Qu.: 62.00       3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300  
##  Max.   :289.00       Max.   :1.0037   Max.   :4.010   Max.   :2.0000  
##     alcohol         quality     
##  Min.   : 8.40   Min.   :3.000  
##  1st Qu.: 9.50   1st Qu.:5.000  
##  Median :10.20   Median :6.000  
##  Mean   :10.42   Mean   :5.636  
##  3rd Qu.:11.10   3rd Qu.:6.000  
##  Max.   :14.90   Max.   :8.000

First loading the Data and showing the names of the variables “coulomns” and showing the statistical calculation.

Univariate Plots Section

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.636   6.000   8.000

The chart is normally distributed and the mean equals 5.636 , also i summarized the data to make sure the chart is correct.

As we can see the fixed acidity is skewed to the right.

As we can see the volatile acidity is skewed to the right.

As we can see the citrid Acid isn’t normal distributed.

As we can see also the residual sugar is skewed to the right.

As we can see also the chlorides is skewed to the right.

Also the free sulfur dioxide is skewed to the right.

Also the total sulfur dioxide is skewed to the right.

As we can see the density is normally distributed.

Also the PH is normally distributed.

Also the sulphates is skewed to the right.

Also the alcohol is skewed to the right.

the above chart is about the new variable that i created that calculates the quality of the alcohol which is alcohol.Quality whether the level is low or medium or excellent, from the chart it seems that the medium has the highest wine count.

## 'data.frame':    1599 obs. of  14 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...
##  $ alcohol.Quality     : Ord.factor w/ 3 levels "low"<"Medium"<..: 2 2 2 2 2 2 2 3 3 2 ...

Univariate Analysis

What is the structure of your dataset?

my data set contains 1599 aboservation and 13 variables + 1 varialbles that i created. so i have 14 variables.

What is/are the main feature(s) of interest in your dataset?

the main feature(s) of my dataset is Quality of the alcohol.

What other features in the dataset do you think will help support your into your feature(s) of interest?

sulphates and PH.

Did you create any new variables from existing variables in the dataset?

i create alcohol.Quality.

Of the features you investigated, were there any unusual distributions? you perform any operations on the data to tidy, adjust, or change the form  of the data? If so, why did you do this?

No , there isn’t any unusual distribution

Bivariate Plots Section

First i will start with scatter plot.

The relationship between the citrid Acid and Fixed Acidity is positive. the higher citrid Acid is the higher fixed acidity.

The relationship between the density and Fixed Acidity is positive. the higher density is the higher fixed acidity.

The relationship between the PH and Fixed Acidity is negative. the lower PH gets the lower fixed acidity.

The relationship between the total sulfur dioxide and free sulfur dioxide is positive. the higher total sulfur dioxide is the higher free sulfur dioxide.

The box plot shows that the wine with the excellent quality has the lowest meadian density.

The box plot shows that the wine with the excellent quality has the highest meadian alcohol.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the . How did the feature(s) of interest vary with other features in  the dataset?

positive relationship:

citrid Acid and Fixed Acidity density and Fixed Acidity total sulfur dioxide and free sulfur dioxide

negative relationship:

PH and Fixed Acidity

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

PH and density and residual sugar and density.

What was the strongest relationship you found?

density and Fixed Acidity the strongest relationship.

Multivariate Plots Section

Multivariate Analysis

Talk about some of the relationships you observed in this part of the . Were there features that strengthened each other in terms of at your feature(s) of interest?

From the previous graphs i see that Alcohol and residual.sugar are important for the quality of the wine.

Were there any interesting or surprising interactions between features?

Alcohol and citric.acid.


Final Plots and Summary

Plot One

Description One

the histogram shows that good quality represent 80% from the wine and the chart is normally distributed around 5- 6.

Plot Two

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

Description Two

The histogram shows that the chart is skewed to the right and the mean value is 10.42

Plot Three

## geom_point: na.rm = FALSE
## stat_summary: fun.data = NULL, fun.y = mean, fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE
## position_identity

Description Three

The boxplot shows that the excellent quality of wine has the lowest meadian density.


Reflection

in the explore and summarize data project i am working with R language and this is my first time programming using it, it is easy to undurstand but slightly tricky you need to be carefull. overall it was a good experience learning a new things and immeditly working with it within a project like this.

this dataset contains 1599 abservation of 14 variables, step by step i realized that the concentration of the alcohol is related with the quality and the density so i created a new variable called alcoholQuality to measure the levels of the wine quality also i created it beacause i figured that the quality is all about the product which is the wine. i hope i will be more analysing deeply about the quality and the affect of other chemical variables and the statistical in the near future.